Search CORE

3 research outputs found

Segmentation de textes arabes en unités discursives minimales

Author: Benamara Zitoune Farah
Hadrich Belguith Lamia
Keskes Iskandar
Publication venue: LINA - Laboratoire d'Informatique de Nantes Atlantique
Publication date: 01/01/2013
Field of study

La segmentation d'un texte en Unités Discursives Minimales (UDM) a pour but de découper le texte en segments qui ne se chevauchent pas. Ces segments sont ensuite reliés entre eux afin de construire la structure discursive d'un texte. La plupart des approches existantes utilisent une analyse syntaxique extensive. Malheureusement, certaines langues ne disposent pas d'analyseur syntaxique robuste. Dans cet article, nous étudions la faisabilité de la segmentation discursive de textes arabes en nous basant sur une approche d'apprentissage supervisée qui prédit les UDM et les UDM imbriqués. La performance de notre segmentation a été évaluée sur deux genres de corpus: des textes de livres de l'enseignement secondaire et des textes du corpus Arabic Treebank. Nous montrons que la combinaison de traits typographiques, morphologiques et lexicaux permet une bonne reconnaissance des bornes de segments. De plus, nous montrons que l'ajout de traits syntaxiques n'améliore pas les performances de notre segmentation

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

HAL Descartes

Hal-Diderot

Splitting Arabic Texts into Elementary Discourse Units

Author: Abdul-Mageed M.
Abu-Jbara A.
Afantenos S.
Afantenos S. D.
Al-Saif A.
Al-Saif A.
Belguith H. L.
Boujelben I.
Charoensuk J.
Da Cunha I.
Darwish K.
Diab M.
Diab M.
Eskander R.
Farah Benamara Zitoune
Fisher S.
Green S.
Gridach M.
Habash N.
Iskandar Keskes
Kamp H.
Keskes I.
Khalifa I.
Lamia Hadrich Belguith
Lüngen H.
Maamouri M.
Maamouri M.
Mourad A.
Nivre J.
Polanyi L.
Prasad A.
Sadat F.
Sawalha M.
Subba R.
Sumita K.
Tofiloski M.
Trigui O.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/06/2014
Field of study

International audienceIn this article, we propose the first work that investigates the feasibility of Arabic discourse segmentation into elementary discourse units within the segmented discourse representation theory framework. We first describe our annotation scheme that defines a set of principles to guide the segmentation process. Two corpora have been annotated according to this scheme: elementary school textbooks and newspaper documents extracted from the syntactically annotated Arabic Treebank. Then, we propose a multiclass supervised learning approach that predicts nested units. Our approach uses a combination of punctuation, morphological, lexical, and shallow syntactic features. We investigate how each feature contributes to the learning process. We show that an extensive morphological analysis is crucial to achieve good results in both corpora. In addition, we show that adding chunks does not boost the performance of our system

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

Automatic detection of irony

Author: Benamara Zitoune Farah
Karoui Jihen
Moriceau Veronique
Publication venue: 'Wiley'
Publication date: 01/01/2019
Field of study

CERN Document Server